#Hey look its penguins

##Here’s a look at the palmer penguins data set:

## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

The steps of data science are:

  1. Import
  2. Clean
  3. Visualize
  4. Model
  5. Report

We don’t need to do any real cleaning for this data set. But we can show some pretty pictures…

Here, we see that male and female penguins have distinct body mass distributions for each species of penguin. Don’t believe your eyes?

Here are some stats:

term estimate std.error statistic p.value
(Intercept) 3368.8356 36.21222 93.030365 0.0000000
sexmale 674.6575 51.21181 13.173867 0.0000000
speciesChinstrap 158.3703 64.24029 2.465279 0.0142039
speciesGentoo 1310.9058 54.42228 24.087666 0.0000000
sexmale:speciesChinstrap -262.8928 90.84950 -2.893718 0.0040627
sexmale:speciesGentoo 130.4372 76.43559 1.706498 0.0888649

The equation for our model is:

\[ E( \operatorname{body_mass_g} ) = \alpha + \beta_{1}(\operatorname{sex}_{\operatorname{male}}) + \beta_{2}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{3}(\operatorname{species}_{\operatorname{Gentoo}}) + \beta_{4}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{5}(\operatorname{sex}_{\operatorname{male}} \times \operatorname{species}_{\operatorname{Gentoo}}) \]